-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1871175: Add support for specifying a schema string for DataFrame.create_dataframe
#2828
Conversation
for i, c in enumerate(s): | ||
if c in ["<", "("]: | ||
bracket_depth += 1 | ||
elif c in [">", ")"]: | ||
bracket_depth -= 1 | ||
if bracket_depth < 0: | ||
raise ValueError(f"Mismatched bracket in '{s}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this bracket check logic has repeated multiple times
do you think it's possible to check the bracket match as the initial step for only one time for the whole input string, and then in the downstream logic we can only focus on extracting the names and types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to parse bracket to split fields, and extract names and types anyway. There is indeed a duplicate of validating whether the bracket expression is valid or not, maybe we can remove it. But to make the function self-contained, maybe let's still keep it? They are also covered in the test.
tests/unit/test_types.py
Outdated
dt = type_string_to_type_object( | ||
" col1 : int , col2 : map< string , decimal(5,2) > " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt = type_string_to_type_object( | |
" col1 : int , col2 : map< string , decimal(5,2) > " | |
) | |
dt = type_string_to_type_object( | |
" col1 : int , col2 : map< string , decimal( 5 , 2 ) > " | |
) |
can we add spacing here too
- When passing a **string**, it can be either an *explicit* struct | ||
(e.g. ``"struct<a: int, b: string>"``) or an *implicit* struct | ||
(e.g. ``"a: int, b: string"``). Internally, the string is parsed and | ||
converted into a :class:`StructType` using Snowpark's type parsing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean a valid struct must contain col_name: data_type
pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
64fe8b4
to
1a5e325
Compare
1a5e325
to
677e733
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change able to parse "not null", e.g. "struct struct<i: integer not null>"
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1871175
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Support schema string